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Abstract. The longest common subsequence (LCS) problem is a classic and well-studied problem in computer 
science. Palindrome is a word which reads the same forward as it does backward. The longest common palindromic 
subsequence (LCPS) problem is an interesting variant of the classic LCS problem which finds the longest common 
subsequence between two given strings such that the computed subsequence is also a palindrome. In this paper, we 
study the LCPS problem and give efficient algorithms to solve this problem. To the best of our knowledge, this is 
the first attempt to study and solve this interesting problem. 
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1 Introduction 

The longest common subsequence (LCS) problem is a classic and well-studied problem in computer science 
with a lot of variants arising out of different practical scanarios. In this paper, we introduce and study the 
longest common palindromic subsequence (LCPS) problem. A subsequence of a string is obtained by delet- 
ing zero or more symbols of that string. A common subsequence of two strings is a subsequence common 
to both the strings. A palindrome is a word, phrase, number, or other sequence of units which reads the 
same forward as it does backward. The LCS problem for two strings is to find a common subsequence in 
both the strings, having maximum possible length. In the LCPS problem, the computed longest common 
subsequence, i.e., LCS, must also be a palindrome. More formally, given a pair of strings X and Y over the 
alphabet E, the goal of the LCPS problem is to compute a Longest Common Subsequence Z such that Z is 
a palindrome. In what follows, for the sake of convenience we will assume, that X and Y have equal length, 
n. But our result can be easily extended to handle two strings of different length. 

String and sequence algorithms related to palindromes have attracted stringology researchers since 
long [1,2,4,5,8-14]. The LCPS problem also seems to be a new interesting addition to the already rich 
list of problems related to palindromes. Apart from being interesting from pure theoretical point of view, 
LCPS has motivation from computational biology as well. Biologists believe that palindromes play an im- 
portant role in regulation of gene activity and other cell processes because these are often observed near 
promoters, introns and specific untranslated regions. So, finding common palindromes in two genome se- 
quences can be an important criterion for comparing them, and also to find common relationships between 
them. 

To the best of our knowledge, there exists no research work in the literature on computing longest com- 
mon palindromic subsequences. However, the problem of computing palindromes and variants in a single 
sequence has received much attention in the literature. Martnek and Lexa studied faster palindrome search- 
ing methods by hardware acceleration [12]. They showed that their results are better than software methods. 
A searching method for palindromic sequences in the primary structure of protein was presented in [7]. 
Manacher discovered an on-line sequential algorithm that finds all 'initial palindromes in a string [10]. 
Gusfield gave a linear-time algorithm to find all 'maximal' palindromes in a string [6]. Porto and Barbosa 
gave an algorithm to find all approximate palindromes in a string [14]. In [4], a simple web based tool is 



1 A string X [1 ... n] is said to have an initial palindrome of length k if the prefix S[l ... fc] is a palindrome. 



presented to assist biologist to detect palindromes in a DNA sequence. Authors in [13] solved the problem 
of finding all palindromes in SLP (Straight Line Programs)-compressed strings. Additionally, a number of 
variants of palindromes have also been investigated in the literature [8, 3, 9]. Very recently, Tomohiro et. al. 
worked on pattern matching problems involving palindromes [15]. 

1.1 Our Contribution 

In this paper, we introduce and study the LCPS problem. We, propose two methods for finding an LCPS, 
given two strings. Firstly we present a dynamic programming algorithm to solve the problem with time 
complexity 0(w 4 ), where n is the size of the strings. Then, we present another algorithm that runs in 
0(1Z 2 log 3 n) time. Here, the set of all ordered pair of matches between two strings is denoted by M and 

\M\ = n. 

2 Preliminaries 

We assume a finite alphabet, E. For a string X = x\X2 ■ ■ ■ x n , we define the string Xi . . . Xj (1 < i < j < n) 
as a substring of X and denote it by X^j. A palindrome is a string which reads the same forward and back- 
ward. We say a string Z = z\Z2 ■ ■ ■ z u is a palindrome iff z\ = z u -i + \ for any 1 < i < |~| ] . A subsequence 
of a string X is a sequence obtained by deleting zero or more characters from X. A subsequence Z of X 
is a palindromic subsequence if Z is a palindrome. For two strings X and Y, if a common subsequence Z 
of X and Y is a palindrome, then Z is said to be a common palindromic subsequence ( CPS). A CPS of two 
strings having the maximum length is called the Longest Common Palindromic Subsequence (LCPS) and 
we denote it by LCPS(X, Y). 

For two strings X = x\Xi ■ ■ ■ x n and Y = y\U2 ■ ■ ■ y n we define a match to be an ordered pair (i, j) such 
that X and Y has a matching character at that position respectively, that is x\ = yj. The set of all matches 
between two strings X and Y is denoted by M and it is defined as, M. = {(i, j) : 1 < i < n, 1 < j < 
n and xi = yj}. And we have, \M\ = 1Z. We define, M a as a subset of M such that all matches within this 
set match to a single character a G U. That is, Ma- = {(hj) '■ 1 < * < n,l < j < n and xi = y 3 ■ = a G 
Z 1 }. And we also have \M. a \ = 1Z a - Clearly, Mo- C M and M = [J M a . Each member of M a is called 

a a-match. 

3 A Dynamic Programming Algorithm 

A brute-force approach to this problem would be to enumerate all the subsequences of X and Y and compare 
them, keeping track of the longest palindromic subsequence found. There are 2" subsequences of any string 
of length n. So the brute force approach would lead to an exponential time algorithm. In this section, we will 
devise a dynamic programming algorithm for the LCPS problem. Here, we will see that the natural classes 
of subproblems for LCPS correspond to pairs of substrings of the two input sequences. We first present the 
following theorem which proves the optimal substructure property of the LCPS problem. 

Theorem 1. Let X and Y are two sequences of length n and Xij = XjXj+i . . . Xj-\Xj and Yj^ = 
Vkyk+i ■ ■ ■ ye-iye are two substrings of those respectively. Let Z = z\Z2 ■ ■ ■ z u be the LCPS of the two 
substrings, Xjj and Y^. Then, the following statements hold, 

1. If Xi = Xj = yk = ye = a (a £ U), then z\ = z u = a and zi ■ ■ ■ z u -\ is an LCPS of and 
Yk+i,e-i- 

2. If Xi = Xj = yk = yi condition does not hold then, Z is an LCPS of(Xi + ij and Y^^) or (Xij^i and 
Y kj e) or (Xij and Y kj e-i) or (X iyj and Y k+1/ ). 



Proof. (1) Since Z is a palindrome by definition so we have z\ = z u . If z\ = z u ^ a then we can append 
a at both ends of Z to obtain a common palindromic subsequence of Xij and Yj^ of length u + 2, which 
contradicts the assumption that Z is the LCPS of Xij and Y^. So we must have z\ = z u = a. Now, 
the substring Z2 ■ ■ ■ Zu-i with length u — 2 is itself a palindrome and it is common to both JQ + ij_i and 
Y"fc+i^_i. We need to show that it is an LCPS. For the purpose of contradiction let us assume that there is 
an common palindromic subsequence W of and Yk+i,e-i with length greater than u — 2. Then 

appending a to both ends of W will produce a common subsequence of X^j and Yk/ with length greater 
than u, which is a contradiction. 

(2) Since Z is a palindrome so z\ = z u . Here we have that the condition xi = Xj = Uk = Ue does not 
hold. So z\ or Z2 is not equal to at least one of xi or Xj or y^ or y^. Therefore Z is a common palindromic 
subsequence of the substrings obtained by deleting at least one character from either end of X^j or Y^p 
If any pair of substrings obtained by deleting one character from either end of Xjj or Yj.^ has a common 
palindromic subsequence W with length greater than u then it would also be a common palindromic sub- 
sequence of X^ and Y^, contradicting the assumption that Z is a LCPS' of Xjj and Y^ £. 

This completes the proof. □ 



From Theorem 1, we see that if xi = x 3 ■ = yu = ye = a ( a € S ), we must find an LCPS of and 
Yfc +1 £_! and append a on its both ends to yield the LCPS of Xij and Y^. Otherwise, we must solve four 
subproblems and take the maximum of those. These four subproblems correspond to finding LCPS of: 

(a) X i+hj and Y k/ (b) X^-i and Y fci ^ (c) Xjj and Y" fe)€ _i and (d) X id and Y fc+i ^ 

Let us define lcps[i, j, k, £] to be the length of the LCPS of Xjj and Y^^. If either i > j or fc > £ then 
one of the substrings is empty and hence the length of our LCPS is 0. So we have, 



If either of the substrings has length 1, then the obtained LCPS will have length or 1 depending on 
whether that single character can form a common palindrome between the two substrings. In this case we 
have, 



lcps[i, j, k, £] = if i > j or k > £ 



(1) 



lcps[i,j, k,£] = 1 



if (i = j or k = £) and (either of X{ or Xj equals either of y k or yg) (2) 



Using the base cases of Equations 1 and 2 and the optimal substructure property of LCPS (Theorem 1), 
we have the following recursive formula: 



Algorithm 1 LCPSLength(X,Y) 



1: n <— length[X] 

2: for i = 1 to n do 

3: for j — 1 to i do 

4: for k = 1 to n do 

5: for £ = 1 to fc do 

6: if (i = j or k = £) and (either of a; 4 or a;-,- equals either of y k or y e )) then 

7: £cps[i, j, k,£] = 1 

8: else 

9: icps[i, fc,£] = 

10: end if 

1 1 : end for 

12: end for 

13: end for 

14: end for 

15: for xLength = 2 to n do 

16: for y Length = 2 to n do 

17: for i = 1 to n — xLength + 1 do 

18: for = 1 to n — y Length + 1 do 

19: j = i + xLength 

20: £ = fc + y Length 

21: if = % — yk = ye then 

22: lcps[i,j,k,£] = 2 + lcps[i + l,j - l,k + 1,£ - 1] 

23: else 

24: lcps[i,j, k, l\ = max(/cps[i + 1, j, fe, £],lcps[i,j — l,k,£], lcps[i,j, k + 1,1], lcps[i,j, k,l— 1]) 

25: end if 

26: end for 

27: end for 

28: end for 

29: end for 

30: return leps 



lcps[i,j,k,£] = < 



2 + lcps[i+ 1, j - l,k + 1,£- 1] 



ma~x.(lcps[i + 1, j, k,£], lcps[i,j — 1, 
lcps[i,j, k + 1, £], lcps[i, j, k, i — 1]) 



i > j or k > £ 
(i = j or k = £) 
and 

(either of Xj or 
equals 

either of y fe or y t ) 
(i < j and k < £) 
and 

«i = Xj =y k = y e 

(i < j and k < £) 
and 

the condition (xj = Xj = yu = ye) 
does not hold 



(3) 



The length of an LCPS between X and Y shall be stored at lcps[l,n,l,n]. Since there are 6>(n 4 ) 



distinct subproblems, we can use dynamic programming to compute the solution in a bottom up manner. 
Algorithm 1 outlines the LCPSLength procedure which takes two sequences X and Y as inputs. It stores the 
leps [i, j, k, i\ values in the n x n x n x n size table leps. The table entries i > j , k > I has value since these 
entries correspond to at least one empty substring. We proceed in our computation with increasing length 
of the substrings. That is, table entries for substrings of length v are already computed before substrings of 
length v + 1. The procedure returns the laps table and lcps[l,n, l,n] contains the length of an LCPS of X 
and Y. Theorem 2 gives us the running time of Algorithm 1. 

Theorem 2. LC P S Length(X , Y) computes the length of an LCPS of X and Y in 0{n A ) time. 

Proof. The initialization step takes 0(n 4 ) time. As the algorithm proceeds, it computes the LCPS of sub- 
strings of X and Y in such a way that substrings of length v is considered before substrings of length v + 1. 
Now, there are 0(n 2 ) possible pairs of lengths between X and Y. For each of these pairs there are 0(n 2 ) 
possible start position pairs. So the four nested loops in Lines 15-18 requires 0{n A ) time. And each table 
entry takes 0(1) time to compute. So the table computation takes 0(n 4 ) time. □ 

We can use the lengths computed in laps table returned by LCS-Length to construct an LCPS of X and Y. We 
simply begin at lcps[l, n, 1, n] and trace back through the table. As soon as we find that xi = Xj = yu = ye, 
we find an element of LCPS, and recursively try to find the LCPS for and ^+1,^-1- Otherwise, 

we find the maximum value in the laps table for (Xi+ij, Y^/), (Xij-i, Yj,/), pQj, Vfc+i^), (Xij, 
and then use that value to compute subsequent members of LCPS recursively. Since at least one of i, j, k,£ 
is decremented in each recursive call, this procedure takes 0{n) time to construct an LCPS of X and Y. 

4 A Second Approach 

In this section, we present a second approach to efficiently solve the LCPS problem. In particular, we will 
first reduce our problem to a geometry problem and then solve it with the help of a balanced binary search 
tree data structure. The resulting algorithm will run in 0(1Z 2 log 3 n) time. Recall that, 1Z is the number of 
ordered pairs at which the two strings match. First we make the following claim. 

Claim 1 Any common palindromic subsequence Z — z\Z2 ■ ■ ■ z u of two strings X and Y can be decom- 
posed into a set of a -match pairs (a € E). 

Proof. Since Z is a palindrome itself so we have, Zi = for 1 < i < |"|] . Since Z is common to both 

X and Y, each zi , 1 < i < u corresponds to a cr-match between X and Y. Therefore, zi and z u -i + \ is a 
cr-match pair. Now we can obtain cr-match pairs by pairing up each Zi and z u _j+i for all 1 < i < \^~\ . So 
we have decomposed Z into a set of cr-match pairs. □ 

It follows from Claim 1 that constructing a common palindromic subsequence of two strings can be seen 
as constructing an appropriate set of cr-match pairs between the input strings. An arbitrary pair of cr-match, 
((i, j), (k, £)) (say mi), from among all pair of cr-matches between a pair of strings, can be seen as inducing 
a substring pair in the input strings. Now suppose we want to construct a common palindromic subsequence 
Z with length u with mi at the two ends of Z. Clearly we have z\ = z u = Xj = Xj = yk = ye- Then to 
compute Z, we will have to recursively select cr-match pairs between the induced substrings X^j and Yj^. In 
this way we shall get a set of cr-matches which will correspond to the common palindromic sub-sequences of 
the input strings. If we consider all possible cr-match pairs as the two end points of the common palindromic 
sub-sequence then the longest obtained one among all these will be an LCPS of the input strings. This is the 
basic idea for constructing LCPS in our new approach. 

To compute M a for any a € U, we first linearly scan X and Y to compute two arrays, X a and Y a , 
which contains the indices in X and Y where a occurs. Then we take each pair between the two arrays to 
get all the ordered pairs where a occurs in both strings. 



4.1 Mapping the LCPS Problem to a Geometry Problem 

Each match between the strings X and Y can be visualized as a point on a n x n rectangular grid where all the 
co-ordinates have integer values. Then, any rectangle in the grid corresponds to a pair of substrings of X and 
Y. Any cr-match pair defines two corner points of a rectangle and thus induces a rectangle in the grid. Now, 
our goal is to take a pair of cr-matches as the two ends of common palindromic sub-sequence and recursively 
construct the set of pair of cr-matches from within the induced substrings. Clearly, the rectangle induced by 
a pair of cr-matches will in turn contain some points (i.e matches) as well. We recursively continue within 
the induced sub-rectangles to find the LCPS between the substrings induced by the rectangles. When the 
recursion unfolds, we append the cr-match pair on the obtained sequence to get the LCPS that can be obtained 
with our cr-match pair corresponding to the two ends. Clearly, if we do this procedure for all such possible 
cr-match pairs then the longest of them will be our desired LCPS between the two strings. The terminating 
condition of this recursive procedure would be: 

Tl. If there is no point within any rectangle. This corresponds to the case where at least one of the substrings 
is empty. 

T2. If it is not possible to take any pair of cr-matches within any rectangle. This corresponds to the single 
character case in our Dynamic Programming solution. 

So, in summary we do the following. 

1. Identify an induced rectangle (say $1) by a pair of cr-matches. 

2. Pair up cr-matches within ty\ to obtain another rectangle (say ^2) and so on until we encounter either of 
the two terminating conditions Tl or T2. 

3. We repeat the above for all possible cr-match pairs (Vcr € E). 

4. At this point, we have a set of nested rectangle structures. 

5. Here, an increase in the nesting depth of the rectangle structures as it is being constructed, corresponds 
to adding a pair of symbols 2 to the resultant palindromic subsequence. Hence, the set of rectangles with 
maximum nesting depth gives us an LCPS. 

Now our problem reduces to the following interesting geometric problem: Given a set of nested rectan- 
gles defined by the a -match pairs Vcr £ E, we need to find the set of rectangles having the maximum nesting 
depth. 

In what follows, we will refer to this problem as the Maximum Depth Nesting Rectangle Structures 
(MDNRS) problem. 

4.2 A Solution to the MDNRS Problem 

A cr-match pair, (k, I)) basically represents a 2-dimensional rectangle (say Assume, without the 

loss of generality that (i, j) and (k, I) correspond to the lower left corner and upper right corner of •P', respec- 
tively. In what follows, depending on the context, we will sometimes use ((i,j),(k,£)) to denote the corre- 
sponding rectangle. Now, a rectangle )P'(((i', j'), (k', I'))) will be nested within rectangle (k, I))) 
iff the following condition holds: 

i' > i and j' > j and k' < k and £' < I 
<&i' >i and j' > j and -k' > -k and -£' > -£ 
&(i',j',-k',-£>)>(i,j,k,£). 

Now we convert a 2-dimensional rectangle (k, I))) to a 4-dimensional point P#(i,j, —k, —I). 

We say that a point (x, y, z, w) is chained to another point (x' , y' , z' , w') iff (x, y, z, w) > (x', y' , z' , w'). 



2 If condition T2 is reached, only a symbol shall be added. 



Then, it is easy to see that, a rectangle &'(((%', f), (k', £'))), is nested within a rectangle (k, £))) 

iff the point P&>(i f ,j', —k', —£') is chained to the point P&(i,j, —k, —£). Hence, the problem of rinding 
the set of rectangles in 2-dimension having the maximum nesting depth easily reduces to finding the set of 
corresponding points in 4-dimension having the maximum chain length. 

First we give a solution of this problem for 2-dimension. Later we shall extend our solution for 4- 
dimension. In 2D our points will be in the form of (x,y). We maintain a 1-dimensional balanced binary 
search tree T that will contain the x coordinate of the points along with a value as the points are being 
processed. The value indicates the length of longest chain that can be formed starting from any point with 
that x co-ordinate. Initially T is empty. We process the points in non-increasing order of their y coordinates. 
For each point (x, y) we make a query to T for the x' such that x' is the smallest number that is greater than x 
(i.e., a successor query). If the value corresponding to x' is K then we can construct a chain of length K + 1 
starting from the point (x, y), and which will immediately preceded a point with x' as its x-coordinate. Now 
we insert/update x in the tree with corresponding value K + 1. Since T is balanced, any insertion, deletion 
and successor query operation can be done in O(logn) time. The maximum value in T is the maximum 
length of the chain which in turn will yield the length of LCPS between the input sequences. If we also store 
at x the point (x, y), which yields the maximum chain length then we can use that to trace the chain later in 
linear time to get the sequence as well. 

We can extend our 1-Dimensional balanced binary tree to <i-dimension in the form of multi-level trees 
using an inductive definition on d. In (^-dimension we shall store (xi, X2, . . . x^-i) in T with respect to 
x^-coordinates. For all nodes u of T, we associate a (d — 1) -dimensional multi-level balanced binary search 
tree with respect to (xi, X2, . . . x^-i). During insertion, deletion and search operations for d-dimensional 
points we also perform the same operation recursively in the d — 1-dimensional trees. By induction on d it 
can be trivially shown that the insertion, deletion and searching in this balanced multi-level binary search 
tree can be done in 0(}og d n) time. 

Finally to solve our problem we simply use a 3-dimensional balanced multi-level binary search tree. 
Now we process the points (x,y,z,w) in non-increasing order of the highest dimension w. For each point 
(x, y, z, w) we query the tree for (x', y', z') such that x > x', y > y' , z > z' and x', y', z' are the smallest 
number greater than x, y, and z respectively. The rest of the process are same. 

Algorithm 2 outlines the LCPS-New procedure which takes as input two strings X and Y, each of length 
n and the alphabet, U. The following theorem gives the worst case running time of LCPS-New procedure. 

Theorem 3. The LCPS-New procedure computes an LCPS of strings X and Y in 0(1Z 2 log 3 n) time. 

Proof. Since there are 1Z matches between X and Y, we have 0(1Z 2 ) rectangles. Therefore, there are 0(1Z 2 ) 
points in 4-dimension. Since, 1Z = 0(n 2 ) in the worst case, sorting the points require 0{1Z 2 \og1Z 2 ) = 
0(1Z 2 logn) time. Since the coordinate values are bounded within the range 1 to n, we can sort them using 
counting sort algorithm. So this will reduce the sorting time to 0{1Z 2 ). Constructing a 3-dimensional multi- 
level balanced binary search tree from 0(1Z 2 ) points takes 0(1Z 2 log 3 1Z 2 ) = 0(TZ 2 log 3 n) time. Each query 
in tree of can be done in C(log 3 n) time. Now, for 0(1Z 2 ) points, a total of 0(1Z 2 ) queries are made which 
takes a total of 0(R 2 log 3 n) time. Therefore, the overall running time of our algorithm is 0(1Z 2 log 3 n). □ 

Since 1Z = 0(n 2 ), the running time of our algorithm becomes 0{n A log 3 n) in the worst case, which 
is not better than that of the Dynamic Programming algorithm (C(n 4 )). But in cases where we have 1Z = 
0{n) it exhibits very good performance. In such case the running time reduces to 0(n 2 log 3 n). Even for 
TZ = C(n L5 ) this algorithm performs better (0(n 3 log 3 n)) than the DP algorithm. 

5 Conclusion and Future Works 

In this paper, we have introduced and studied the longest common palindromic subsequence (LCPS) prob- 
lem, which is a variant of the classic LCS problem. We have first presented a dynamic programming algo- 



Algorithm 2 LCPS-New(X,Y,I7) 



1 : for each a £ £ do 

2: Ma<-<f> 

3: Xa <r- <t> 

4: Ya •<— (j> 

5: for i = 1 to n do 

6: ifX[i] = athen 

7: X<r <- X<r U {i} 

8: else if = o then 

9: r^yau {i} 

10: end if 

1 1 : end for 

12: for i = 1 to \Xa\ do 

13: forj = 1 to | rcr[ do 

14: M CT <-MaU{{Xa[i],Y(T[i])} 

15: end for 

16: end for 

17: end for 

18: Rectangles <— {Rectangles contains the set of all rectangles} 

19: for each a e £ do 

20: for each match G do 

21: for each match (k, £) € -M CT do 

22: Rectangles <— Rectangles U {(i, j), (fc, £)} 

23: end for 

24: end for 

25: end for 

26: P «- 

27: for each k,£) £ Rectangles do 

28: p<-7?u{(t,j,-fc,-0} 

29: end for 

30: Sort the points in V in non increasing order of 4th dimension 

3 1 : Initialize the multi-level balanced binary search tree T as empty tree 

32: for each point p(i, j, k,l) eP do 

33: Find the point k') in T such that i' > i and j' > j and k! > k and i',j', k! are the smallest integer greater than i, j, 

and k respectively. 
34: K <- the value stored at (i' , j' , k') 
35: if (i, j, k) exists in T then 
36: Update the value of (i, j, k) with K + 1 

37: else 

38: Insert the node k) with value K + 1 

39: end if 

40: Also store (i' , j' , k') in T at the node (i, j, k) as its successor. 

41: end for 

42: leps <— maximum value stored in T 

43: LCPS <— trace the successors to obtain the sequence 

44: return LCPS 



rithm to solve it, which runs in 0(n 4 ) time. Then, we have identified and studied some interesting relation 
of the problem with computational geometry and devised an 0(1Z 2 log 3 n) time algorithm. In our results, we 
have assumed that the two input strings are of equal length n. However, our results can be easily extended 
for the case where the two input strings are of different lengths. To the best of our knowledge this is the 
first attempt in the literature to solve this problem. Further research can also be carried out towards studying 
different other variants of the LCPS problem. 
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